The business task

The end goal of this project is to provide insights through data analysis to assist all stakeholders (Lily Moreno, Cyclistic marketing team and Cyclistic executive team) to design marketing strategies aimed at converting casual riders to annual members.

To do that,three questions need to be addresses: 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?

This project is conducted to answer the first question only and provide a report with the following deliverable: 1. A clear statement of the business task 2 . A description of all data sources used 3. Documentations of any cleaning and manipulation of data 4. A summary of the analysis 5. Supporting visualizations and key findings 6. Top three 3 recommendations based on analysis

The data sources used

For this project, we will use the previous 12 months (from 2021-04 to 2022-03) of Cyclistic’s historical trip data, provided by Motivate International Inc, under this license) to analyze and identify trends.

Documentation of data cleaning or manipulation

  1. Import data from 2021-04 to 2022-03
library(readr)
X202104_trip_data <- read_csv("202104_trip_data.csv")
## Rows: 337230 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202104_trip_data)
library(readr)
X202105_trip_data <- read_csv("202105_trip_data.csv")
## Rows: 531633 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202105_trip_data)
library(readr)
X202106_trip_data <- read_csv("202106_trip_data.csv")
## Rows: 729595 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202106_trip_data)
library(readr)
X202107_trip_data <- read_csv("202107_trip_data.csv")
## Rows: 822410 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202107_trip_data)
library(readr)
X202108_trip_data <- read_csv("202108_trip_data.csv")
## Rows: 804352 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202108_trip_data)
library(readr)
X202109_trip_data <- read_csv("202109_trip_data.csv")
## Rows: 756147 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202109_trip_data)
library(readr)
X202110_trip_data <- read_csv("202110_trip_data.csv")
## Rows: 631226 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202110_trip_data)
library(readr)
X202111_trip_data <- read_csv("202111_trip_data.csv")
## Rows: 359978 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202111_trip_data)
library(readr)
X202112_trip_data <- read_csv("202112_trip_data.csv")
## Rows: 247540 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202112_trip_data)
library(readr)
X202201_trip_data <- read_csv("202201_trip_data.csv")
## Rows: 103770 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202201_trip_data)
library(readr)
X202202_trip_data <- read_csv("202202_trip_data.csv")
## Rows: 115609 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202201_trip_data)
library(readr)
X202203_trip_data <- read_csv("202203_trip_data.csv")
## Rows: 284042 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202203_trip_data)

Findings: All 12 data frames have the same structure

  1. Combine data from 2021-04 to 2022-03 for easier data cleaning and manipulation
trip_data_combined<-rbind(X202104_trip_data,X202105_trip_data,X202106_trip_data,X202107_trip_data,X202108_trip_data,X202109_trip_data,X202110_trip_data,X202111_trip_data,X202112_trip_data,X202201_trip_data,X202202_trip_data,X202203_trip_data)

Findings: Notice lots of “NA”

  1. Drop “NA” to avoid misleading data, assuming entries with “NA” are not real rides by users
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ dplyr   1.0.9
## ✔ tibble  3.1.6     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ✔ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
trip_data_nadrop<-trip_data_combined %>% drop_na()
  1. Add a new column “ride_length” to calculate time intervals between “started_at” and “ended_at” and rename it as “trip_data_nadrop_ridelength”
x<-trip_data_nadrop
x$ride_length<-difftime(trip_data_nadrop$ended_at,trip_data_nadrop$started_at)
trip_data_nadrop_ridelength<-x
  1. Find rides with “ride_length”<=0, further filtering requires investigation into the data set with stakeholders
outliers<-filter(trip_data_nadrop_ridelength,ride_length<=0)

Findings: There are 207 rides with “ride_length”<=0

  1. Filter out rides with “ride_length”<=0
trip_data_nadrop_ridelength_nooutliers<-filter(trip_data_nadrop_ridelength,ride_length>0)
  1. Create new columns: “weekday”, “month”, “quarter”, “hour”
x<-trip_data_nadrop_ridelength_nooutliers
x$weekday<-weekdays(trip_data_nadrop_ridelength_nooutliers$started_at)
x$month<-months(trip_data_nadrop_ridelength_nooutliers$started_at)
x$quarter<-quarters(trip_data_nadrop_ridelength_nooutliers$started_at)
x$hour<-format(trip_data_nadrop_ridelength_nooutliers$started_at,"%H")
trip_data_clean<-x
remove(x)

Visualizations and key findings

  1. Which user type bikes more?
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=member_casual,fill=member_casual))+labs(title=("Which user type bikes more?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "1e+06=1,000,000, Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")

trip_data_clean %>% group_by(member_casual) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100,total_ride_length=sum(ride_length),average_ride_length=sum(total_ride_length/count))
## # A tibble: 2 × 5
##   member_casual   count   `%` total_ride_length average_ride_length
##   <chr>           <int> <dbl> <drtn>            <drtn>             
## 1 casual        2044256  44.0 3931016968 secs   1922.9573 secs     
## 2 member        2596932  56.0 2016519224 secs    776.5006 secs

Findings:

  1. Rides taken by members are close to 56% from 2021-04 to 2022-03 verse 44% from the casual. But the casual, on average, have much longer rides than the member. The casual either bike much slower or for much longer distance than members.
  1. Which bike type do riders like the most and how are different bikes used different?
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=member_casual,fill=member_casual))+labs(title=("Which bike type do riders like the most?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")+facet_wrap(~rideable_type)

trip_data_clean %>% group_by(member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 4
## # Groups:   member_casual [2]
##   member_casual rideable_type   count   `%`
##   <chr>         <chr>           <int> <dbl>
## 1 member        classic_bike  1989279 42.9 
## 2 casual        classic_bike  1252558 27.0 
## 3 member        electric_bike  607653 13.1 
## 4 casual        electric_bike  488183 10.5 
## 5 casual        docked_bike    303515  6.54

Findings:

  1. Classic bikes are the most popular bike type for both members and the casual.

  2. Docked bikes are the least popular ones and interestingly, docked bikes are only taken by the casual. More data regarding different pricing plans for each type of bike and financial aspect of the analysis are needed to find out why.

  1. How so riders behave differently each day of the week?
ggplot(data = trip_data_clean)+geom_bar(position = "dodge",mapping = aes(x=factor(weekday, level=c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday')),fill=member_casual))+labs(title=("How so riders behave differently each of the week?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "1e+06=1,000,000, Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")

trip_data_clean %>% group_by(member_casual,weekday) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday    count   `%`
##    <chr>         <chr>      <int> <dbl>
##  1 casual        Saturday  458032  9.87
##  2 member        Wednesday 411738  8.87
##  3 member        Tuesday   403205  8.69
##  4 casual        Sunday    402105  8.66
##  5 member        Thursday  388455  8.37
##  6 member        Friday    366695  7.90
##  7 member        Monday    360579  7.77
##  8 member        Saturday  350489  7.55
##  9 member        Sunday    315771  6.80
## 10 casual        Friday    288223  6.21
## 11 casual        Monday    231686  4.99
## 12 casual        Thursday  228205  4.92
## 13 casual        Wednesday 222069  4.78
## 14 casual        Tuesday   213936  4.61

Findings:

  1. Members mostly bike during week days and have its peak on Wednesday and hit its lows on weekends.

  2. The casual bike mostly on weekends and you start to notice increase in usage from Friday. They bikes much less and number of rides stay relatively low and consistent on the other week days (Monday, Tuesday, Wednesday, Thursday).

  3. On week days, members bike more than the casual while on weekends, the casual bike more than members.

  1. How do riders behave differently across all hours of each day of the week?
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=hour,fill=member_casual))+labs(title=("How do riders behave differently across all hours of each day of the week?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="Hours of the day", y="Number of rides",caption = "Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")+facet_wrap(factor(weekday,levels = c('Monday','Tuesday','Wednesday','Thursday', 'Friday','Saturday','Sunday'))~.)+theme(axis.text.x = element_text(angle = 90,size = 7,))

trip_data_clean %>% group_by(member_casual,hour) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(hour))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 48 × 4
## # Groups:   member_casual [2]
##    member_casual hour   count   `%`
##    <chr>         <chr>  <int> <dbl>
##  1 casual        23     58820 1.27 
##  2 member        23     40777 0.879
##  3 casual        22     76629 1.65 
##  4 member        22     60496 1.30 
##  5 casual        21     82936 1.79 
##  6 member        21     80117 1.73 
##  7 casual        20     97924 2.11 
##  8 member        20    108920 2.35 
##  9 casual        19    135346 2.92 
## 10 member        19    164261 3.54 
## # … with 38 more rows

Findings:

  1. On week days, both members and the casual have very similar behavior pattern, where number of rides reaches its first peak of the day at 8am, then goes down a bit from 9am to 10am, then starts rising from 11am till its new peak of the day around 5pm.

  2. On weekends, number of riders don’t reach its normal week day level till 9 or 10, then keeps rising and stays at relatively high level till 18pm or 19pm.

  1. How do riders behave differently across quarters?
ggplot(data = trip_data_clean)+geom_bar(position="dodge",mapping = aes(x=quarter,fill=member_casual))+labs(title=("How do riders behave differently across quarters?"),subtitle ="Casual vs Member", x="User type", y="Number of rides",legend="User type")+scale_fill_discrete(name="User type")

trip_data_clean %>% group_by(member_casual,quarter) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups:   member_casual [2]
##   member_casual quarter   count   `%`
##   <chr>         <chr>     <int> <dbl>
## 1 casual        Q3      1003784 21.6 
## 2 member        Q3       983989 21.2 
## 3 member        Q2       716515 15.4 
## 4 casual        Q2       641425 13.8 
## 5 member        Q4       606053 13.1 
## 6 casual        Q4       304149  6.55
## 7 member        Q1       290375  6.26
## 8 casual        Q1        94898  2.04

Findings:

  1. Rides peaked in Q3 and valleyed in Q1 for both members and the casual.

  2. Interesting to see, only in Q3, more rides were taken by the casual than members.

  1. How do riders behave differently across all months of the year?
ggplot(data = trip_data_clean)+geom_bar(position="dodge",mapping = aes(x=factor(month,levels = (c('January','February','March','April','May','June','July','August','September','October','November','December'))),fill=member_casual))+labs(title=("How do riders behave differently across all months of the year?"),subtitle ="Casual vs Member", x="User type", y="Number of rides",legend="User type")+scale_fill_discrete(name="User type")+theme(axis.text.x = element_text(angle = 45))

Findings:

  1. Throughout the year, again, number of rides taken either by members or the casual moved in the same direction.

  2. Number of rides by members peaked in August while number of rides by the casual peaked a little earlier in July.

  1. Where do riders start their rides and where do they go?

First, let’s get the map of Chicago. To decide the ‘bbox’ values, I go to openstreetmap

library(ggplot2)
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
install.packages("ggmap")
## Installing ggmap [3.0.0] ...
##  OK [linked cache]
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
map_chicago<-get_stamenmap(bbox = c(left=-88.3, bottom=41.3, right=-87.0, top=42.4), maptype="terrain",zoom = 11)
## 81 tiles needed, this may take a while (try a smaller zoom).
## Source : http://tile.stamen.com/terrain/11/521/757.png
## Source : http://tile.stamen.com/terrain/11/522/757.png
## Source : http://tile.stamen.com/terrain/11/523/757.png
## Source : http://tile.stamen.com/terrain/11/524/757.png
## Source : http://tile.stamen.com/terrain/11/525/757.png
## Source : http://tile.stamen.com/terrain/11/526/757.png
## Source : http://tile.stamen.com/terrain/11/527/757.png
## Source : http://tile.stamen.com/terrain/11/528/757.png
## Source : http://tile.stamen.com/terrain/11/529/757.png
## Source : http://tile.stamen.com/terrain/11/521/758.png
## Source : http://tile.stamen.com/terrain/11/522/758.png
## Source : http://tile.stamen.com/terrain/11/523/758.png
## Source : http://tile.stamen.com/terrain/11/524/758.png
## Source : http://tile.stamen.com/terrain/11/525/758.png
## Source : http://tile.stamen.com/terrain/11/526/758.png
## Source : http://tile.stamen.com/terrain/11/527/758.png
## Source : http://tile.stamen.com/terrain/11/528/758.png
## Source : http://tile.stamen.com/terrain/11/529/758.png
## Source : http://tile.stamen.com/terrain/11/521/759.png
## Source : http://tile.stamen.com/terrain/11/522/759.png
## Source : http://tile.stamen.com/terrain/11/523/759.png
## Source : http://tile.stamen.com/terrain/11/524/759.png
## Source : http://tile.stamen.com/terrain/11/525/759.png
## Source : http://tile.stamen.com/terrain/11/526/759.png
## Source : http://tile.stamen.com/terrain/11/527/759.png
## Source : http://tile.stamen.com/terrain/11/528/759.png
## Source : http://tile.stamen.com/terrain/11/529/759.png
## Source : http://tile.stamen.com/terrain/11/521/760.png
## Source : http://tile.stamen.com/terrain/11/522/760.png
## Source : http://tile.stamen.com/terrain/11/523/760.png
## Source : http://tile.stamen.com/terrain/11/524/760.png
## Source : http://tile.stamen.com/terrain/11/525/760.png
## Source : http://tile.stamen.com/terrain/11/526/760.png
## Source : http://tile.stamen.com/terrain/11/527/760.png
## Source : http://tile.stamen.com/terrain/11/528/760.png
## Source : http://tile.stamen.com/terrain/11/529/760.png
## Source : http://tile.stamen.com/terrain/11/521/761.png
## Source : http://tile.stamen.com/terrain/11/522/761.png
## Source : http://tile.stamen.com/terrain/11/523/761.png
## Source : http://tile.stamen.com/terrain/11/524/761.png
## Source : http://tile.stamen.com/terrain/11/525/761.png
## Source : http://tile.stamen.com/terrain/11/526/761.png
## Source : http://tile.stamen.com/terrain/11/527/761.png
## Source : http://tile.stamen.com/terrain/11/528/761.png
## Source : http://tile.stamen.com/terrain/11/529/761.png
## Source : http://tile.stamen.com/terrain/11/521/762.png
## Source : http://tile.stamen.com/terrain/11/522/762.png
## Source : http://tile.stamen.com/terrain/11/523/762.png
## Source : http://tile.stamen.com/terrain/11/524/762.png
## Source : http://tile.stamen.com/terrain/11/525/762.png
## Source : http://tile.stamen.com/terrain/11/526/762.png
## Source : http://tile.stamen.com/terrain/11/527/762.png
## Source : http://tile.stamen.com/terrain/11/528/762.png
## Source : http://tile.stamen.com/terrain/11/529/762.png
## Source : http://tile.stamen.com/terrain/11/521/763.png
## Source : http://tile.stamen.com/terrain/11/522/763.png
## Source : http://tile.stamen.com/terrain/11/523/763.png
## Source : http://tile.stamen.com/terrain/11/524/763.png
## Source : http://tile.stamen.com/terrain/11/525/763.png
## Source : http://tile.stamen.com/terrain/11/526/763.png
## Source : http://tile.stamen.com/terrain/11/527/763.png
## Source : http://tile.stamen.com/terrain/11/528/763.png
## Source : http://tile.stamen.com/terrain/11/529/763.png
## Source : http://tile.stamen.com/terrain/11/521/764.png
## Source : http://tile.stamen.com/terrain/11/522/764.png
## Source : http://tile.stamen.com/terrain/11/523/764.png
## Source : http://tile.stamen.com/terrain/11/524/764.png
## Source : http://tile.stamen.com/terrain/11/525/764.png
## Source : http://tile.stamen.com/terrain/11/526/764.png
## Source : http://tile.stamen.com/terrain/11/527/764.png
## Source : http://tile.stamen.com/terrain/11/528/764.png
## Source : http://tile.stamen.com/terrain/11/529/764.png
## Source : http://tile.stamen.com/terrain/11/521/765.png
## Source : http://tile.stamen.com/terrain/11/522/765.png
## Source : http://tile.stamen.com/terrain/11/523/765.png
## Source : http://tile.stamen.com/terrain/11/524/765.png
## Source : http://tile.stamen.com/terrain/11/525/765.png
## Source : http://tile.stamen.com/terrain/11/526/765.png
## Source : http://tile.stamen.com/terrain/11/527/765.png
## Source : http://tile.stamen.com/terrain/11/528/765.png
## Source : http://tile.stamen.com/terrain/11/529/765.png
ggmap(map_chicago)

Second, plot starting points and ending points on the map of Chicago

ggmap(map_chicago)+geom_jitter(trip_data_clean,mapping = aes(x=start_lng,y=start_lat),color="yellow")+facet_wrap(~member_casual)+labs(title = "Where do riders start their rides?",subtitle = "Casual vs Member from 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 1 rows containing missing values (geom_point).

ggmap(map_chicago)+geom_jitter(trip_data_clean,mapping = aes(x=end_lng,y=end_lat),color="red")+facet_wrap(~member_casual)+labs(title = "Where do riders go with their rides?",subtitle = "Casual vs Member from 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))

Findings:

  1. Members and the casual have very similar behavior pattern in terms of where they start their rides (in yellow) and where they go with rides (in red). But occasionally, the casual are more likely to take rides to places along the coast that are further away from the city center than members.
  1. How do users use different bikes?
ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="member"),mapping = aes(x=end_lng,y=end_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do members go with their rides?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))

ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="casual"),mapping = aes(x=end_lng,y=end_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do the casual go with their rides?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))

ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="member"),mapping = aes(x=start_lng,y=start_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do members start their rides with differnet bikes?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))

ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="casual"),mapping = aes(x=start_lng,y=start_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do the casual start their rides with different bikes?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 1 rows containing missing values (geom_point).

trip_data_clean %>% group_by(end_station_name,member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'end_station_name', 'member_casual'. You
## can override using the `.groups` argument.
## # A tibble: 3,746 × 5
## # Groups:   end_station_name, member_casual [1,691]
##    end_station_name         member_casual rideable_type count   `%`
##    <chr>                    <chr>         <chr>         <int> <dbl>
##  1 Streeter Dr & Grand Ave  casual        classic_bike  38496 0.829
##  2 Streeter Dr & Grand Ave  casual        docked_bike   20226 0.436
##  3 Clark St & Elm St        member        classic_bike  18720 0.403
##  4 Michigan Ave & Oak St    casual        classic_bike  18090 0.390
##  5 Millennium Park          casual        classic_bike  18046 0.389
##  6 Wells St & Concord Ln    member        classic_bike  17988 0.388
##  7 Kingsbury St & Kinzie St member        classic_bike  17894 0.386
##  8 Wells St & Elm St        member        classic_bike  15885 0.342
##  9 Broadway & Barry Ave     member        classic_bike  14364 0.309
## 10 Theater on the Lake      casual        classic_bike  14222 0.306
## # … with 3,736 more rows

Findings:

  1. Electric bikes are most likely chosen for longer rides.

Summary of the analysis

  1. Which user type bikes more?

From the data provided, we can clearly see that more rides were taken by members (56%) than the casual (44%). But we don’t know how many distinct members/the casual have taken the rides. Unique user ids linked to rides should be requested to be able to find out how often members and the casual bike respectively.

  1. How long do they bike?

From the table summarized above, the casual, on average, spent 1923 secs (32mins) / ride biking while members spent 777 secs (12mins) / ride.. It verifies the hypothesis made earlier that the casual are more likely to ride for leisure while members are more likely to ride to commute for efficiency.

  1. Where do they bike?

On maps plotted with starting and ending points of rides, very identical shapes are presented for both members and the casual except a few outliers. In other words, riders, regardless of user type, bike on identical routes, except that the casual,occasionally, bike further away from the city center along the coast. The outliers also verifies that the casual are more likely to bike for leisure.

  1. When do they bike?

Members tend to bike more on week days to commute to work from 7 to 8 am and commute back from work from 5-6 pm; and rides by members are less prone to change due to season / weather change because of its nature as necessary vehicles to commute. On the other hand, the casual tend to bike for leisure mostly on weekends and Friday and are more prone to change due to season/weather change. For both members and the casual, the majority of rides happened in the afternoon before evening.

  1. Which type of bike do they choose?

Almost 70% of all rides taken are classic bikes, 43% by members and 27% by the casual respectively. 7% of rides taken are docked bikes and are all by the casual. The rest of 23% are electric bikes, with 13% by members and 10% by the casual. Combined with additional data regarding different pricing plans for each type of bike and financial data should help refine Cyclistic’s offerings to convert the casual to annual members.

  1. Why do they bike?

Based on the analysis above, members probably bike for leisure and to commute, while the casual mainly bike for leisure.

Top three recommendations

  1. Marketing campaign should focus on biking for leisure, focusing on classic bikes and docked bikes. The top destinations for the casual are below:
trip_data_clean %>% filter(member_casual=="casual") %>% group_by(end_station_name,member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'end_station_name', 'member_casual'. You
## can override using the `.groups` argument.
## # A tibble: 2,224 × 5
## # Groups:   end_station_name, member_casual [852]
##    end_station_name                   member_casual rideable_type count   `%`
##    <chr>                              <chr>         <chr>         <int> <dbl>
##  1 Streeter Dr & Grand Ave            casual        classic_bike  38496 0.829
##  2 Streeter Dr & Grand Ave            casual        docked_bike   20226 0.436
##  3 Michigan Ave & Oak St              casual        classic_bike  18090 0.390
##  4 Millennium Park                    casual        classic_bike  18046 0.389
##  5 Theater on the Lake                casual        classic_bike  14222 0.306
##  6 Wells St & Concord Ln              casual        classic_bike  12400 0.267
##  7 DuSable Lake Shore Dr & North Blvd casual        classic_bike  12348 0.266
##  8 Shedd Aquarium                     casual        classic_bike  11393 0.245
##  9 Clark St & Lincoln Ave             casual        classic_bike  10948 0.236
## 10 Lake Shore Dr & North Blvd         casual        classic_bike  10785 0.232
## # … with 2,214 more rows
  1. More budget should be put towards Friday and weekends in the afternoon, especially in Q3.
  2. New bike types and new pricing plans that are specifically tailored to the nature of casual riders (bike for leisure) could be offered, for example, dualie bikes with weekly/monthly/quarterly plans.